Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Oct 7, 2025

Fix namespace null handling for low-code connectors with S3-data-lake destination

Summary

Fixes PyAirbyte namespace handling issue where low-code connectors (like source-tiktok-marketing and source-snapchat-marketing) fail when writing to destination-s3-data-lake with the error "streams.0.stream.namespace: Null value is not allowed".

The fix adds exclude_none=True to catalog serialization in both destinations/base.py and sources/base.py. This makes Pydantic omit null fields entirely from JSON output instead of serializing them as explicit nulls, which the destination CDK can handle.

Root cause: When PyAirbyte serializes ConfiguredAirbyteCatalog to JSON for passing to connectors, fields with None values (like namespace) were being serialized as explicit JSON null values ("namespace": null). The destination CDK validation rejects explicit null values but accepts omitted fields.

Prior work: This exact fix was implemented in commit 4c5a6d2 by AJ Steers but was somehow lost in subsequent changes.

Review & Testing Checklist for Human

This is a moderate risk change that requires end-to-end testing:

  • Test with actual low-code connectors: Verify source-tiktok-marketing or source-snapchat-marketing can successfully write to destination-s3-data-lake without the "Null value is not allowed" error
  • Verify JSON serialization behavior: Inspect the actual JSON catalog passed to connectors to confirm null namespace fields are omitted (not serialized as "namespace": null)
  • Regression testing: Test other connector types (non-low-code) writing to various destinations to ensure no new issues are introduced
  • CI verification: Ensure all tests pass, especially any integration tests that involve catalog serialization

Notes

Summary by CodeRabbit

  • Bug Fixes
    • Catalog JSON output now omits null fields, producing cleaner and smaller files. This improves compatibility with tools that mis-handle null values and reduces noise in diffs and logs. Applies to both emitted catalogs during reads and temporary files generated by destinations. Existing behavior and error handling remain unchanged.

…m catalog serialization

Re-applies fix from commit 4c5a6d2 that was previously implemented but lost.

When PyAirbyte serializes ConfiguredAirbyteCatalog to JSON for passing to connectors,
fields with None values (like namespace) were being serialized as explicit null values.
The destination CDK validation is strict and rejects explicit null values, but accepts
omitted fields entirely.

Using exclude_none=True causes Pydantic to omit null fields from JSON output instead
of serializing them as explicit nulls, which resolves the issue.

Fixes: https://airbytehq-team.slack.com/archives/C06FZ238P8W/p1759500714217339
Devin run: https://app.devin.ai/sessions/6f299c861622435da6fbd89f3aeba145

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

Original prompt from AJ Steers
Received message in Slack channel #ask-devin-ai:

@Devin - Can you try to resolve this? We ran into it once before - maybe with a different field? Please find the prior PyAirbyte PR that fixed something similar (related to more fragile null handling in the new destination CDK). IIRC, the destination can tolerate an omitted key but not a key declared with a null value.
Thread URL: https://airbytehq-team.slack.com/archives/C08BHPUMEPJ/p1759870325547589?thread_ts=1759870325.547589

Quote of conversation (https://airbytehq-team.slack.com/archives/C06FZ238P8W/p1759500714217339?thread_ts=1759500714.217339&cid=C06FZ238P8W):
> From Aymeric Creusot
> Hello there, we're currently facing a problem trying to use the destination S3-data-lake with pyairbyte.

> It seems that no namespace is passed to the source catalog on low_code connecter such as source-tiktok-marketing or source-snapchat-marketing (didn't try other type of conn).
> Resulting with an error when `destination.write(source)` :
> ```streams.0.stream.namespace: Null value is not allowed. (code: 1021)
> From: streams.0.&lt;items&gt;.&lt;#/definitions/ConfiguredAirbyteStream&gt;.stream.&lt;#/definitions/AirbyteStream&gt;.namespace.&lt;nullable&gt;```
> While on airbyte OSS, connection seem to work fine with the same config as pyairbyte.
> I am missing something ?
> Posted on October 03, 2025 at 02:11 PM

Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

github-actions bot commented Oct 7, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1759870941-fix-namespace-null-handling' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1759870941-fix-namespace-null-handling'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

@aaronsteers
Copy link
Contributor Author

Devin, reply here with context: Was this fixed prior but reverted? Did we not merge the prior fix? Or was the fix applicable to a different codepath or model?

Copy link
Contributor

devin-ai-integration bot commented Oct 7, 2025

Hi @aaronsteers,

Based on my investigation of the git history:

Yes, this was previously fixed in PR #748 but that PR never merged.

The previous attempt:

Current state on main branch:

  • Both destinations/base.py (line 268) and sources/base.py (line 755) are missing exclude_none=True when serializing catalogs
  • This causes null namespace fields to be serialized as explicit "namespace": null in the JSON
  • The destination CDK validation rejects explicit nulls but accepts omitted fields

This PR (#825):

The issue is occurring because without exclude_none=True, Pydantic serializes None values as explicit JSON nulls, which the strict destination CDK validation rejects.

Let me know if you need any additional context!


Requested by @aaronsteers
Devin run: https://app.devin.ai/sessions/6f299c861622435da6fbd89f3aeba145

Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

📝 Walkthrough

Walkthrough

Updated JSON serialization for configured catalogs in sources and destinations to omit None-valued fields by adding exclude_none=True to model_dump_json() calls. No control flow or error handling changes.

Changes

Cohort / File(s) Summary of Changes
Catalog serialization behavior
airbyte/destinations/base.py, airbyte/sources/base.py
Switched catalog serialization to catalog.model_dump_json(exclude_none=True), causing None fields to be omitted from emitted JSON; no other logic modified.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Would you like to add a brief note in release docs that catalog JSON may now omit None fields to inform integrators, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The provided title clearly and concisely summarizes the main change by indicating the use of Pydantic’s exclude_none=True to omit null namespace fields during catalog serialization, which aligns perfectly with the modifications in the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1759870941-fix-namespace-null-handling

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte/sources/base.py (1)

755-755: Consider applying exclude_none=True to other model_dump_json calls
We found additional calls (state_writers, state_providers, _state_backend, _message_iterators, _executors/base) lacking exclude_none. Would you add exclude_none=True for consistency and to omit None fields? wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99d2ccf and 3828404.

📒 Files selected for processing (2)
  • airbyte/destinations/base.py (1 hunks)
  • airbyte/sources/base.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/destinations/base.py (2)
airbyte/shared/sql_processor.py (1)
  • catalog_provider (226-242)
airbyte/shared/catalog_providers.py (1)
  • configured_catalog (70-72)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/destinations/base.py (1)

268-268: LGTM! Null handling fix looks good.

This correctly adds exclude_none=True to omit None-valued fields from the catalog JSON, which should resolve the "namespace: Null value is not allowed" error when writing to destination-s3-data-lake. The change aligns with how the destination CDK expects catalog data (omitted fields vs. explicit nulls).

Copy link

github-actions bot commented Oct 7, 2025

PyTest Results (Fast Tests Only, No Creds)

304 tests  ±0   304 ✅ ±0   4m 26s ⏱️ +11s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 3828404. ± Comparison against base commit 99d2ccf.

Copy link

github-actions bot commented Oct 7, 2025

PyTest Results (Full)

368 tests  ±0   352 ✅ ±0   21m 40s ⏱️ + 1m 2s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 3828404. ± Comparison against base commit 99d2ccf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant